In [1]:
%%html
<link rel="stylesheet" type="text/css" href="rise.css" />

Pandas dataframes (tables)¶

Learning goals

  • You will be able to to work with tabular data sets.
  • You will appreciate how useful pandas is for exploratory data analysis.

Install Pandas¶

conda activate neu365
pip install pandas
In [2]:
import pandas as pd
/var/folders/cq/0x3wj8sx1_dbbm61zp0zqbmm0000gn/T/ipykernel_94698/4080736814.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd

Data_Cortex_Nuclear

https://www.kaggle.com/ruslankl/mice-protein-expression/version/1

Data_Cortex_Nuclear.csv

Data_Cortex_Nuclear Excel

Load a spreadsheet from file into a Pandas DataFrame.

In [3]:
df = pd.read_csv("data/Data_Cortex_Nuclear.csv")
In [4]:
df
Out[4]:
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N ... pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 82 columns

In [5]:
df.shape
Out[5]:
(1080, 82)

Column titles.

In [6]:
df.columns
Out[6]:
Index(['MouseID', 'DYRK1A_N', 'ITSN1_N', 'BDNF_N', 'NR1_N', 'NR2A_N', 'pAKT_N',
       'pBRAF_N', 'pCAMKII_N', 'pCREB_N', 'pELK_N', 'pERK_N', 'pJNK_N',
       'PKCA_N', 'pMEK_N', 'pNR1_N', 'pNR2A_N', 'pNR2B_N', 'pPKCAB_N',
       'pRSK_N', 'AKT_N', 'BRAF_N', 'CAMKII_N', 'CREB_N', 'ELK_N', 'ERK_N',
       'GSK3B_N', 'JNK_N', 'MEK_N', 'TRKA_N', 'RSK_N', 'APP_N', 'Bcatenin_N',
       'SOD1_N', 'MTOR_N', 'P38_N', 'pMTOR_N', 'DSCR1_N', 'AMPKA_N', 'NR2B_N',
       'pNUMB_N', 'RAPTOR_N', 'TIAM1_N', 'pP70S6_N', 'NUMB_N', 'P70S6_N',
       'pGSK3B_N', 'pPKCG_N', 'CDK5_N', 'S6_N', 'ADARB1_N', 'AcetylH3K9_N',
       'RRP1_N', 'BAX_N', 'ARC_N', 'ERBB4_N', 'nNOS_N', 'Tau_N', 'GFAP_N',
       'GluR3_N', 'GluR4_N', 'IL1B_N', 'P3525_N', 'pCASP9_N', 'PSD95_N',
       'SNCA_N', 'Ubiquitin_N', 'pGSK3B_Tyr216_N', 'SHH_N', 'BAD_N', 'BCL2_N',
       'pS6_N', 'pCFOS_N', 'SYP_N', 'H3AcK18_N', 'EGR1_N', 'H3MeK4_N',
       'CaNA_N', 'Genotype', 'Treatment', 'Behavior', 'class'],
      dtype='object')

Each row has an associated index.

In [7]:
df.index
Out[7]:
RangeIndex(start=0, stop=1080, step=1)

Row indexes are just 0-based indices.

In [8]:
df
Out[8]:
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N ... pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 82 columns

Row indexes do NOT have to be numeric or in any particular order, but they must be unique.

In [9]:
df.index = [f"Mouse {i}" for i in df.index]
df
Out[9]:
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N ... pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 82 columns

In [10]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1080 entries, Mouse 0 to Mouse 1079
Data columns (total 82 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   MouseID          1080 non-null   object 
 1   DYRK1A_N         1077 non-null   float64
 2   ITSN1_N          1077 non-null   float64
 3   BDNF_N           1077 non-null   float64
 4   NR1_N            1077 non-null   float64
 5   NR2A_N           1077 non-null   float64
 6   pAKT_N           1077 non-null   float64
 7   pBRAF_N          1077 non-null   float64
 8   pCAMKII_N        1077 non-null   float64
 9   pCREB_N          1077 non-null   float64
 10  pELK_N           1077 non-null   float64
 11  pERK_N           1077 non-null   float64
 12  pJNK_N           1077 non-null   float64
 13  PKCA_N           1077 non-null   float64
 14  pMEK_N           1077 non-null   float64
 15  pNR1_N           1077 non-null   float64
 16  pNR2A_N          1077 non-null   float64
 17  pNR2B_N          1077 non-null   float64
 18  pPKCAB_N         1077 non-null   float64
 19  pRSK_N           1077 non-null   float64
 20  AKT_N            1077 non-null   float64
 21  BRAF_N           1077 non-null   float64
 22  CAMKII_N         1077 non-null   float64
 23  CREB_N           1077 non-null   float64
 24  ELK_N            1062 non-null   float64
 25  ERK_N            1077 non-null   float64
 26  GSK3B_N          1077 non-null   float64
 27  JNK_N            1077 non-null   float64
 28  MEK_N            1073 non-null   float64
 29  TRKA_N           1077 non-null   float64
 30  RSK_N            1077 non-null   float64
 31  APP_N            1077 non-null   float64
 32  Bcatenin_N       1062 non-null   float64
 33  SOD1_N           1077 non-null   float64
 34  MTOR_N           1077 non-null   float64
 35  P38_N            1077 non-null   float64
 36  pMTOR_N          1077 non-null   float64
 37  DSCR1_N          1077 non-null   float64
 38  AMPKA_N          1077 non-null   float64
 39  NR2B_N           1077 non-null   float64
 40  pNUMB_N          1077 non-null   float64
 41  RAPTOR_N         1077 non-null   float64
 42  TIAM1_N          1077 non-null   float64
 43  pP70S6_N         1077 non-null   float64
 44  NUMB_N           1080 non-null   float64
 45  P70S6_N          1080 non-null   float64
 46  pGSK3B_N         1080 non-null   float64
 47  pPKCG_N          1080 non-null   float64
 48  CDK5_N           1080 non-null   float64
 49  S6_N             1080 non-null   float64
 50  ADARB1_N         1080 non-null   float64
 51  AcetylH3K9_N     1080 non-null   float64
 52  RRP1_N           1080 non-null   float64
 53  BAX_N            1080 non-null   float64
 54  ARC_N            1080 non-null   float64
 55  ERBB4_N          1080 non-null   float64
 56  nNOS_N           1080 non-null   float64
 57  Tau_N            1080 non-null   float64
 58  GFAP_N           1080 non-null   float64
 59  GluR3_N          1080 non-null   float64
 60  GluR4_N          1080 non-null   float64
 61  IL1B_N           1080 non-null   float64
 62  P3525_N          1080 non-null   float64
 63  pCASP9_N         1080 non-null   float64
 64  PSD95_N          1080 non-null   float64
 65  SNCA_N           1080 non-null   float64
 66  Ubiquitin_N      1080 non-null   float64
 67  pGSK3B_Tyr216_N  1080 non-null   float64
 68  SHH_N            1080 non-null   float64
 69  BAD_N            867 non-null    float64
 70  BCL2_N           795 non-null    float64
 71  pS6_N            1080 non-null   float64
 72  pCFOS_N          1005 non-null   float64
 73  SYP_N            1080 non-null   float64
 74  H3AcK18_N        900 non-null    float64
 75  EGR1_N           870 non-null    float64
 76  H3MeK4_N         810 non-null    float64
 77  CaNA_N           1080 non-null   float64
 78  Genotype         1080 non-null   object 
 79  Treatment        1080 non-null   object 
 80  Behavior         1080 non-null   object 
 81  class            1080 non-null   object 
dtypes: float64(77), object(5)
memory usage: 700.3+ KB

Some statistics for each column.

In [11]:
df.describe()
Out[11]:
DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N ... SHH_N BAD_N BCL2_N pS6_N pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N
count 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 1077.000000 ... 1080.000000 867.000000 795.000000 1080.000000 1005.000000 1080.000000 900.000000 870.000000 810.000000 1080.000000
mean 0.425810 0.617102 0.319088 2.297269 3.843934 0.233168 0.181846 3.537109 0.212574 1.428682 ... 0.226676 0.157914 0.134762 0.121521 0.131053 0.446073 0.169609 0.183135 0.205440 1.337784
std 0.249362 0.251640 0.049383 0.347293 0.933100 0.041634 0.027042 1.295169 0.032587 0.466904 ... 0.028989 0.029537 0.027417 0.014276 0.023863 0.066432 0.059402 0.040406 0.055514 0.317126
min 0.145327 0.245359 0.115181 1.330831 1.737540 0.063236 0.064043 1.343998 0.112812 0.429032 ... 0.155869 0.088305 0.080657 0.067254 0.085419 0.258626 0.079691 0.105537 0.101787 0.586479
25% 0.288121 0.473361 0.287444 2.057411 3.155678 0.205755 0.164595 2.479834 0.190823 1.203665 ... 0.206395 0.136424 0.115554 0.110839 0.113506 0.398082 0.125848 0.155121 0.165143 1.081423
50% 0.366378 0.565782 0.316564 2.296546 3.760855 0.231177 0.182302 3.326520 0.210594 1.355846 ... 0.224000 0.152313 0.129468 0.121626 0.126523 0.448459 0.158240 0.174935 0.193994 1.317441
75% 0.487711 0.698032 0.348197 2.528481 4.440011 0.257261 0.197418 4.481940 0.234595 1.561316 ... 0.241655 0.174017 0.148235 0.131955 0.143652 0.490773 0.197876 0.204542 0.235215 1.585824
max 2.516367 2.602662 0.497160 3.757641 8.482553 0.539050 0.317066 7.464070 0.306247 6.113347 ... 0.358289 0.282016 0.261506 0.158748 0.256529 0.759588 0.479763 0.360692 0.413903 2.129791

8 rows × 77 columns

Let's get rid of all the _N in the column titles to make things easier to look at.

In [12]:
coltitles = list(df.columns)
for i in range(len(coltitles)):
    if coltitles[i].endswith('_N'):
        coltitles[i] = coltitles[i][:-2]
df.columns = coltitles
df
Out[12]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 82 columns

Index/slice into the rows of the DataFrame just like a NumPy array.

In [13]:
df[:5]  # first 5 rows
Out[13]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m

5 rows × 82 columns

In [14]:
df[-9::2]  # last 9 rows every other
Out[14]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 1071 J3295_7 0.318448 0.509701 0.290576 2.314752 3.041115 0.229135 0.175855 2.881429 0.206036 ... 0.127265 0.405127 0.307199 0.192350 0.269951 1.340743 Ts65Dn Saline S/C t-SC-s
Mouse 1073 J3295_9 0.280946 0.528821 0.299645 2.290991 3.243548 0.246021 0.195797 2.946067 0.221759 ... 0.105369 0.408533 0.277181 0.182646 0.237009 1.297315 Ts65Dn Saline S/C t-SC-s
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

5 rows × 82 columns

Get a specific column by name.

In [15]:
df['Genotype']
Out[15]:
Mouse 0       Control
Mouse 1       Control
Mouse 2       Control
Mouse 3       Control
Mouse 4       Control
               ...   
Mouse 1075     Ts65Dn
Mouse 1076     Ts65Dn
Mouse 1077     Ts65Dn
Mouse 1078     Ts65Dn
Mouse 1079     Ts65Dn
Name: Genotype, Length: 1080, dtype: object

Get multiple columns by name in any order.

In [16]:
df[['Genotype', 'BDNF']]
Out[16]:
Genotype BDNF
Mouse 0 Control 0.430175
Mouse 1 Control 0.411770
Mouse 2 Control 0.418309
Mouse 3 Control 0.358626
Mouse 4 Control 0.358802
... ... ...
Mouse 1075 Ts65Dn 0.254860
Mouse 1076 Ts65Dn 0.251638
Mouse 1077 Ts65Dn 0.234118
Mouse 1078 Ts65Dn 0.243974
Mouse 1079 Ts65Dn 0.256564

1080 rows × 2 columns

  • Single column $\rightarrow$ Series object
  • Multiple columns $\rightarrow$ DataFrame oject
In [17]:
type(df['Genotype']), type(df[['Genotype', 'BDNF']])
Out[17]:
(pandas.core.series.Series, pandas.core.frame.DataFrame)

But you can still get a single column as a DataFrame object if you index with a list that has just a single column title.

In [18]:
df[['Genotype']]
Out[18]:
Genotype
Mouse 0 Control
Mouse 1 Control
Mouse 2 Control
Mouse 3 Control
Mouse 4 Control
... ...
Mouse 1075 Ts65Dn
Mouse 1076 Ts65Dn
Mouse 1077 Ts65Dn
Mouse 1078 Ts65Dn
Mouse 1079 Ts65Dn

1080 rows × 1 columns

It is easy to combine column and row indexing.

In [19]:
df['Genotype'][1:5]  # rows 1-4 in Genotype column
Out[19]:
Mouse 1    Control
Mouse 2    Control
Mouse 3    Control
Mouse 4    Control
Name: Genotype, dtype: object
In [20]:
df[['Genotype', 'BDNF']][-9::3]  # last 9 rows every third for two columns
Out[20]:
Genotype BDNF
Mouse 1071 Ts65Dn 0.290576
Mouse 1074 Ts65Dn 0.254439
Mouse 1077 Ts65Dn 0.234118

Exercise¶

Get the class column for the mice in rows 2-4.

In [ ]:
 

Exercise Key¶

Get the class column for the mice in rows 2-4.

In [21]:
df[['class']][2:5]
Out[21]:
class
Mouse 2 c-CS-m
Mouse 3 c-CS-m
Mouse 4 c-CS-m

Exercise¶

Get the class and Treatment columns for the mice in rows 2-4.

In [ ]:
 

Exercise Key¶

Get the class and Treatment columns for the mice in rows 2-4.

In [22]:
df[['class', 'Treatment']][2:5]
Out[22]:
class Treatment
Mouse 2 c-CS-m Memantine
Mouse 3 c-CS-m Memantine
Mouse 4 c-CS-m Memantine

iloc --> slice like NumPy¶

iloc[rows, columns]: 2D slice just like numpy

In [23]:
df.iloc[3:9:2, 5:10]
Out[23]:
NR2A pAKT pBRAF pCAMKII pCREB
Mouse 3 4.979503 0.222886 0.176463 2.152301 0.207004
Mouse 5 4.807635 0.218578 0.176233 2.141282 0.195188
Mouse 7 4.268735 0.214834 0.179668 2.007985 0.189803

loc --> slice by name¶

loc[rows, columns]

first:last: Note that both the first and last are included (finally, sanity!).

In [24]:
df.loc['Mouse 1077':'Mouse 1079', 'Genotype':'class']
Out[24]:
Genotype Treatment Behavior class
Mouse 1077 Ts65Dn Saline S/C t-SC-s
Mouse 1078 Ts65Dn Saline S/C t-SC-s
Mouse 1079 Ts65Dn Saline S/C t-SC-s

List of [rows,...], [columns,...] in any order.

In [25]:
df.loc[['Mouse 3', 'Mouse 82', 'Mouse 500'], ['Genotype', 'BDNF']]
Out[25]:
Genotype BDNF
Mouse 3 Control 0.358626
Mouse 82 Control 0.299948
Mouse 500 Control 0.295143

Exercise¶

Use loc to get the class and Treatment for the mice from Mouse 53 to Mouse 60.

In [ ]:
 

Exercise Key¶

Use loc to get the class and Treatment for the mice from Mouse 53 to Mouse 60.

In [26]:
df.loc['Mouse 53':'Mouse 60', ['class', 'Treatment']]
Out[26]:
class Treatment
Mouse 53 c-CS-m Memantine
Mouse 54 c-CS-m Memantine
Mouse 55 c-CS-m Memantine
Mouse 56 c-CS-m Memantine
Mouse 57 c-CS-m Memantine
Mouse 58 c-CS-m Memantine
Mouse 59 c-CS-m Memantine
Mouse 60 c-CS-m Memantine

Select rows with a logical array¶

In [27]:
df['Genotype']
Out[27]:
Mouse 0       Control
Mouse 1       Control
Mouse 2       Control
Mouse 3       Control
Mouse 4       Control
               ...   
Mouse 1075     Ts65Dn
Mouse 1076     Ts65Dn
Mouse 1077     Ts65Dn
Mouse 1078     Ts65Dn
Mouse 1079     Ts65Dn
Name: Genotype, Length: 1080, dtype: object
In [28]:
df['Genotype'] == 'Ts65Dn'
Out[28]:
Mouse 0       False
Mouse 1       False
Mouse 2       False
Mouse 3       False
Mouse 4       False
              ...  
Mouse 1075     True
Mouse 1076     True
Mouse 1077     True
Mouse 1078     True
Mouse 1079     True
Name: Genotype, Length: 1080, dtype: bool
In [29]:
has_down_syndrome = (df['Genotype'] == 'Ts65Dn')

# only rows for down syndrome mice
df[has_down_syndrome]
Out[29]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 570 3414_1 0.838053 1.006324 0.364953 2.603061 4.207680 0.202548 0.164055 2.222619 0.230776 ... NaN 0.400078 0.112587 0.107095 0.123739 1.829242 Ts65Dn Memantine C/S t-CS-m
Mouse 571 3414_2 0.938245 1.077099 0.366149 2.703612 4.484228 0.197465 0.169923 2.289908 0.249214 ... NaN 0.402721 0.112256 0.105537 0.128380 1.831980 Ts65Dn Memantine C/S t-CS-m
Mouse 572 3414_3 0.956488 1.108441 0.384049 2.662564 4.606357 0.203799 0.171625 2.311949 0.253610 ... NaN 0.403164 0.108300 0.114087 0.129905 1.809744 Ts65Dn Memantine C/S t-CS-m
Mouse 573 3414_4 0.731738 0.944241 0.324365 2.320243 3.621250 0.198649 0.162240 2.257041 0.203458 ... NaN 0.388400 0.117319 0.115926 0.133304 1.795498 Ts65Dn Memantine C/S t-CS-m
Mouse 574 3414_5 0.815503 1.005562 0.335976 2.379290 3.641657 0.217870 0.164142 2.291124 0.209941 ... NaN 0.374617 0.120861 0.110322 0.133941 1.823529 Ts65Dn Memantine C/S t-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

510 rows × 82 columns

In [30]:
got_saline = (df['Treatment'] == 'Saline')

# only rows for down syndrome mice that recieved saline treatment
df[has_down_syndrome & got_saline]
Out[30]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 840 18899_1 0.506200 0.696046 0.316019 2.204591 4.154323 0.254859 0.180295 2.557473 0.192694 ... NaN 0.397663 0.155484 NaN 0.187052 1.357802 Ts65Dn Saline C/S t-CS-s
Mouse 841 18899_2 0.523760 0.746212 0.324897 2.285640 4.322314 0.268767 0.194387 2.648244 0.198864 ... NaN 0.388920 0.164507 NaN 0.182911 1.327418 Ts65Dn Saline C/S t-CS-s
Mouse 842 18899_3 0.518612 0.733233 0.356137 2.330148 4.631455 0.272468 0.186284 2.624078 0.192656 ... NaN 0.387212 0.156970 NaN 0.179930 1.267106 Ts65Dn Saline C/S t-CS-s
Mouse 843 18899_4 0.436986 0.626614 0.295108 2.008023 3.605088 0.258317 0.183562 2.648141 0.194521 ... NaN 0.428589 0.187594 NaN 0.233242 1.329915 Ts65Dn Saline C/S t-CS-s
Mouse 844 18899_5 0.505599 0.719826 0.314600 2.194110 3.908544 0.281833 0.200539 2.804231 0.210701 ... NaN 0.416420 0.182469 NaN 0.231235 1.297778 Ts65Dn Saline C/S t-CS-s
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

240 rows × 82 columns

In [31]:
# MouseID and pCFOS levels
# for only down syndrome mice that recieved saline treatment
df[has_down_syndrome & got_saline][['MouseID', 'pCFOS']]
Out[31]:
MouseID pCFOS
Mouse 840 18899_1 NaN
Mouse 841 18899_2 NaN
Mouse 842 18899_3 NaN
Mouse 843 18899_4 NaN
Mouse 844 18899_5 NaN
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

Missing values¶

In [32]:
df2 = df[has_down_syndrome & got_saline][['MouseID', 'pCFOS']]
df2
Out[32]:
MouseID pCFOS
Mouse 840 18899_1 NaN
Mouse 841 18899_2 NaN
Mouse 842 18899_3 NaN
Mouse 843 18899_4 NaN
Mouse 844 18899_5 NaN
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

Get rid of any mice with missing data.

In [33]:
df2.dropna()
Out[33]:
MouseID pCFOS
Mouse 855 3476_1 0.109931
Mouse 856 3476_2 0.123052
Mouse 857 3476_3 0.114269
Mouse 858 3476_4 0.111655
Mouse 859 3476_5 0.123541
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

210 rows × 2 columns

Replace missing values with zeros.

In [34]:
df2.fillna(0)
Out[34]:
MouseID pCFOS
Mouse 840 18899_1 0.000000
Mouse 841 18899_2 0.000000
Mouse 842 18899_3 0.000000
Mouse 843 18899_4 0.000000
Mouse 844 18899_5 0.000000
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

Note that generally operations do NOT alter the dataframe.

Despite having run the dropna and fillna methods, df2 still has its original missing values.

In [35]:
df2
Out[35]:
MouseID pCFOS
Mouse 840 18899_1 NaN
Mouse 841 18899_2 NaN
Mouse 842 18899_3 NaN
Mouse 843 18899_4 NaN
Mouse 844 18899_5 NaN
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

To keep the results of an operation you generally have two options:

  1. Store the result in a new dataframe.
  2. Apply the operation "inplace".
In [36]:
df3 = df2.fillna(0)
df3
Out[36]:
MouseID pCFOS
Mouse 840 18899_1 0.000000
Mouse 841 18899_2 0.000000
Mouse 842 18899_3 0.000000
Mouse 843 18899_4 0.000000
Mouse 844 18899_5 0.000000
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

In [37]:
df2.fillna(-1, inplace=True)
df2
Out[37]:
MouseID pCFOS
Mouse 840 18899_1 -1.000000
Mouse 841 18899_2 -1.000000
Mouse 842 18899_3 -1.000000
Mouse 843 18899_4 -1.000000
Mouse 844 18899_5 -1.000000
... ... ...
Mouse 1075 J3295_11 0.183324
Mouse 1076 J3295_12 0.175674
Mouse 1077 J3295_13 0.158296
Mouse 1078 J3295_14 0.196296
Mouse 1079 J3295_15 0.187556

240 rows × 2 columns

Logical mask for whether or not a data value is missing.

In [38]:
df2 = df[has_down_syndrome & got_saline][['MouseID', 'pCFOS']]
df2.isnull()
Out[38]:
MouseID pCFOS
Mouse 840 False True
Mouse 841 False True
Mouse 842 False True
Mouse 843 False True
Mouse 844 False True
... ... ...
Mouse 1075 False False
Mouse 1076 False False
Mouse 1077 False False
Mouse 1078 False False
Mouse 1079 False False

240 rows × 2 columns

The number of missing values in each column.

In [39]:
# sum applied to each column
df.isnull().sum()
Out[39]:
MouseID      0
DYRK1A       3
ITSN1        3
BDNF         3
NR1          3
            ..
CaNA         0
Genotype     0
Treatment    0
Behavior     0
class        0
Length: 82, dtype: int64
In [40]:
missing = df.isnull().sum()

missing.plot();
No description has been provided for this image

Import and configure matplotlib¶

In [41]:
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
plt.style.use('bmh')
plt.rcParams['figure.figsize'] = [12,4]
plt.rcParams['lines.linewidth'] = 1.5

Pandas plots use matplotlib, so all styling will be applied.

In [42]:
missing.plot();
No description has been provided for this image
In [43]:
missing.plot.bar();
No description has been provided for this image
In [44]:
missing > 10
Out[44]:
MouseID      False
DYRK1A       False
ITSN1        False
BDNF         False
NR1          False
             ...  
CaNA         False
Genotype     False
Treatment    False
Behavior     False
class        False
Length: 82, dtype: bool
In [45]:
df.loc[:,missing > 10]
Out[45]:
ELK Bcatenin BAD BCL2 pCFOS H3AcK18 EGR1 H3MeK4
Mouse 0 1.866358 3.037621 0.122652 NaN 0.108336 0.114783 0.131790 0.128186
Mouse 1 1.761047 2.921882 0.116682 NaN 0.104315 0.111974 0.135103 0.131119
Mouse 2 1.765544 2.944136 0.118508 NaN 0.106219 0.111883 0.133362 0.127431
Mouse 3 1.286277 2.500204 0.132781 NaN 0.111262 0.130405 0.147444 0.146901
Mouse 4 1.324695 2.456560 0.129954 NaN 0.110694 0.118481 0.140314 0.148380
... ... ... ... ... ... ... ... ...
Mouse 1075 0.896430 1.828208 0.190483 NaN 0.183324 0.318782 0.204660 0.328327
Mouse 1076 0.958879 1.883370 0.190463 NaN 0.175674 0.325639 0.200415 0.293435
Mouse 1077 0.762892 1.494208 0.216682 NaN 0.158296 0.321306 0.229193 0.355213
Mouse 1078 0.781893 1.571820 0.222263 NaN 0.196296 0.335936 0.251317 0.365353
Mouse 1079 0.884735 1.742768 0.227606 NaN 0.187556 0.335062 0.252995 0.365278

1080 rows × 8 columns

You might think about not using the handful of features with a lot of missing values for your analysis.

In [46]:
# drop columns with a lot of missing data
dfok = df.loc[:,missing <= 10]
dfok
Out[46]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... Ubiquitin pGSK3B_Tyr216 SHH pS6 SYP CaNA Genotype Treatment Behavior class
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 1.044979 0.831557 0.188852 0.106305 0.427099 1.675652 Control Memantine C/S c-CS-m
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 1.009883 0.849270 0.200404 0.106592 0.441581 1.743610 Control Memantine C/S c-CS-m
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.996848 0.846709 0.193685 0.108303 0.435777 1.926427 Control Memantine C/S c-CS-m
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.990225 0.833277 0.192112 0.103184 0.391691 1.700563 Control Memantine C/S c-CS-m
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.997775 0.878668 0.205604 0.104784 0.434154 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 1.261651 0.962942 0.275547 0.115806 0.374088 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 1.254872 0.983690 0.283207 0.113614 0.375259 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 1.242248 0.976609 0.290843 0.118948 0.422121 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 1.301071 0.989286 0.306701 0.125295 0.397676 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 1.267120 1.020383 0.292330 0.118899 0.420347 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 74 columns

Exercise¶

Get the MouseID for all mice missing pCFOS measurements.

In [ ]:
 

Exercise Key¶

Get the MouseID for all mice missing pCFOS measurements.

In [47]:
missing_pCFOS = df['pCFOS'].isnull()

df['MouseID'][missing_pCFOS]
Out[47]:
Mouse 15         311_1
Mouse 16         311_2
Mouse 17         311_3
Mouse 18         311_4
Mouse 19         311_5
                ...   
Mouse 1060    J1291_11
Mouse 1061    J1291_12
Mouse 1062    J1291_13
Mouse 1063    J1291_14
Mouse 1064    J1291_15
Name: MouseID, Length: 75, dtype: object

Add columns¶

In [48]:
df['HasDownSyndrome'] = has_down_syndrome
df
Out[48]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class HasDownSyndrome
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m False
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m False
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m False
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m False
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s True
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s True
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s True
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s True
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s True

1080 rows × 83 columns

Remove columns¶

In [49]:
del df['HasDownSyndrome']
df
Out[49]:
MouseID DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
Mouse 0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 ... 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
Mouse 1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 ... 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
Mouse 2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 ... 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
Mouse 3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 ... 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
Mouse 4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 ... 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Mouse 1075 J3295_11 0.254860 0.463591 0.254860 2.092082 2.600035 0.211736 0.171262 2.483740 0.207317 ... 0.183324 0.374088 0.318782 0.204660 0.328327 1.364823 Ts65Dn Saline S/C t-SC-s
Mouse 1076 J3295_12 0.272198 0.474163 0.251638 2.161390 2.801492 0.251274 0.182496 2.512737 0.216339 ... 0.175674 0.375259 0.325639 0.200415 0.293435 1.364478 Ts65Dn Saline S/C t-SC-s
Mouse 1077 J3295_13 0.228700 0.395179 0.234118 1.733184 2.220852 0.220665 0.161435 1.989723 0.185164 ... 0.158296 0.422121 0.321306 0.229193 0.355213 1.430825 Ts65Dn Saline S/C t-SC-s
Mouse 1078 J3295_14 0.221242 0.412894 0.243974 1.876347 2.384088 0.208897 0.173623 2.086028 0.192044 ... 0.196296 0.397676 0.335936 0.251317 0.365353 1.404031 Ts65Dn Saline S/C t-SC-s
Mouse 1079 J3295_15 0.302626 0.461059 0.256564 2.092790 2.594348 0.251001 0.191811 2.361816 0.223632 ... 0.187556 0.420347 0.335062 0.252995 0.365278 1.370999 Ts65Dn Saline S/C t-SC-s

1080 rows × 82 columns

Visualize columns¶

In [50]:
df['pCFOS'].plot();
No description has been provided for this image

Plots use Matplotlib.

In [51]:
df['pCFOS'].plot()
plt.ylabel('pCFOS')
plt.title('pCFOS column data');
No description has been provided for this image
In [52]:
df[['pCREB', 'pCFOS']].plot()
plt.ylabel('pCFOS')
plt.title('Each trace is a column');
No description has been provided for this image

Group rows & compute something for each group¶

Also see pivot_table

In [53]:
df.groupby('Genotype')
Out[53]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11eafcd70>
In [54]:
df.groupby('Genotype').mean(numeric_only=True)
Out[54]:
DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB pELK ... SHH BAD BCL2 pS6 pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA
Genotype
Control 0.404756 0.575927 0.321132 2.333654 3.984687 0.229578 0.181515 3.505182 0.207285 1.443155 ... 0.226943 0.157057 0.135478 0.123365 0.132923 0.460376 0.156779 0.189647 0.198516 1.324037
Ts65Dn 0.449480 0.663394 0.316791 2.256362 3.685690 0.237204 0.182220 3.573004 0.218520 1.412411 ... 0.226378 0.158766 0.134017 0.119460 0.128882 0.430088 0.183324 0.176156 0.211409 1.353148

2 rows × 77 columns

In [55]:
genotype_means = df.groupby('Genotype').mean(numeric_only=True)

# plot defaults to plotting each column as a line
genotype_means.plot(legend=None);
No description has been provided for this image
In [56]:
genotype_means.T
Out[56]:
Genotype Control Ts65Dn
DYRK1A 0.404756 0.449480
ITSN1 0.575927 0.663394
BDNF 0.321132 0.316791
NR1 2.333654 2.256362
NR2A 3.984687 3.685690
... ... ...
SYP 0.460376 0.430088
H3AcK18 0.156779 0.183324
EGR1 0.189647 0.176156
H3MeK4 0.198516 0.211409
CaNA 1.324037 1.353148

77 rows × 2 columns

In [57]:
genotype_means.T.plot();
No description has been provided for this image
In [58]:
genotype_means.T.plot.bar();
No description has been provided for this image

Nested groups¶

In [59]:
df.groupby(['Genotype', 'Treatment']).sum(numeric_only=True)
Out[59]:
DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB pELK ... SHH BAD BCL2 pS6 pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA
Genotype Treatment
Control Memantine 113.048904 163.342224 94.524479 679.107326 1165.193447 70.677721 55.763691 1147.877081 60.994917 415.632666 ... 67.707033 36.093774 27.603222 37.183411 36.413589 138.641528 39.322980 43.772525 35.955030 380.042435
Saline 117.662182 164.936101 88.520562 651.075739 1106.078423 60.181752 47.699587 850.076614 57.157384 406.965410 ... 61.650598 31.754763 27.265559 33.134565 35.364678 123.772534 33.579453 41.568827 38.488287 374.658381
Ts65Dn Memantine 128.135872 184.111650 85.562397 617.758208 1028.994680 65.171495 50.619970 998.872821 58.749817 397.631829 ... 60.426136 29.642812 28.212642 33.140195 32.913052 116.010262 34.909445 37.637040 41.285369 361.054093
Saline 99.750587 152.228856 75.050752 526.217556 839.650309 55.091079 41.765377 812.639988 52.040005 318.460521 ... 55.026843 39.420299 24.054167 27.784312 27.016986 103.334536 44.836425 36.348656 50.677370 329.051542

4 rows × 77 columns

In [60]:
groups = df.groupby(['Genotype', 'Treatment']).sum(numeric_only=True)

groups.T
Out[60]:
Genotype Control Ts65Dn
Treatment Memantine Saline Memantine Saline
DYRK1A 113.048904 117.662182 128.135872 99.750587
ITSN1 163.342224 164.936101 184.111650 152.228856
BDNF 94.524479 88.520562 85.562397 75.050752
NR1 679.107326 651.075739 617.758208 526.217556
NR2A 1165.193447 1106.078423 1028.994680 839.650309
... ... ... ... ...
SYP 138.641528 123.772534 116.010262 103.334536
H3AcK18 39.322980 33.579453 34.909445 44.836425
EGR1 43.772525 41.568827 37.637040 36.348656
H3MeK4 35.955030 38.488287 41.285369 50.677370
CaNA 380.042435 374.658381 361.054093 329.051542

77 rows × 4 columns

In [61]:
groups.T.plot();
No description has been provided for this image
In [62]:
groups.T.plot.bar();
No description has been provided for this image
In [63]:
groups.T[:20].plot.bar();
No description has been provided for this image

Sort rows¶

In [64]:
df.sort_values(by='BDNF')[['MouseID', 'Genotype', 'BDNF']]
Out[64]:
MouseID Genotype BDNF
Mouse 181 3413_2 Control 0.115181
Mouse 192 3413_13 Control 0.185980
Mouse 974 3425_15 Ts65Dn 0.194160
Mouse 193 3413_14 Control 0.197271
Mouse 190 3413_11 Control 0.198159
... ... ... ...
Mouse 32 320_3 Control 0.470056
Mouse 375 3497_1 Control 0.497160
Mouse 987 3426_13 Ts65Dn NaN
Mouse 988 3426_14 Ts65Dn NaN
Mouse 989 3426_15 Ts65Dn NaN

1080 rows × 3 columns

In [65]:
# One reason why it is beneficial to attach indexes to each row
# irrespective of their order is it's easy to keep track of
# specific subjects no matter how we rearrange the data.
df.sort_values(by='BDNF')['MouseID']['Mouse 974']
Out[65]:
'3425_15'

Changing the original DataFrame¶

Everything we've done so far has not altered the original DataFrame.

But what if we want to alter it?

  • Option 1: Assign a copy of whatever we did to a new DataFrame.
  • Option 2: Edit the original DataFrame directily by telling Pandas to perform actions inplace.
In [66]:
# df2 is a copy of the sorted df, df is unchanged
df2 = df.sort_values(by='BDNF')
In [67]:
# df2 is itself resorted
df2.sort_values(by='pCFOS', inplace=True)

Editing the data frame¶

Most column manipulations work just like NumPy arrays.

In [68]:
df2['ITSN1'] = df2['DYRK1A'] * 100
df2['BDNF'] = 8

df2[['DYRK1A', 'ITSN1', 'BDNF']]
Out[68]:
DYRK1A ITSN1 BDNF
Mouse 149 0.472849 47.284879 8
Mouse 887 0.880699 88.069855 8
Mouse 272 0.376922 37.692195 8
Mouse 885 0.787914 78.791400 8
Mouse 140 0.633407 63.340684 8
... ... ... ...
Mouse 50 0.457992 45.799236 8
Mouse 21 0.702608 70.260829 8
Mouse 46 0.516108 51.610811 8
Mouse 45 0.560614 56.061437 8
Mouse 47 0.507321 50.732128 8

1080 rows × 3 columns

In [69]:
df2.loc['Mouse 140', 'BDNF'] = 100

df2[['BDNF']]
Out[69]:
BDNF
Mouse 149 8
Mouse 887 8
Mouse 272 8
Mouse 885 8
Mouse 140 100
... ...
Mouse 50 8
Mouse 21 8
Mouse 46 8
Mouse 45 8
Mouse 47 8

1080 rows × 1 columns

In [70]:
df2.loc['Mouse 21':'Mouse 45', 'BDNF'] = 500

df2[['BDNF']]
Out[70]:
BDNF
Mouse 149 8
Mouse 887 8
Mouse 272 8
Mouse 885 8
Mouse 140 100
... ...
Mouse 50 8
Mouse 21 500
Mouse 46 500
Mouse 45 500
Mouse 47 8

1080 rows × 1 columns

NumPy --> Pandas¶

In [71]:
import numpy as np

a = np.random.random([3,4])
a
Out[71]:
array([[0.70046587, 0.86081322, 0.19431979, 0.15695987],
       [0.67005485, 0.87417845, 0.72893385, 0.00119436],
       [0.2056958 , 0.34201637, 0.72300174, 0.0150288 ]])
In [72]:
af = pd.DataFrame(a)
af
Out[72]:
0 1 2 3
0 0.700466 0.860813 0.194320 0.156960
1 0.670055 0.874178 0.728934 0.001194
2 0.205696 0.342016 0.723002 0.015029
In [73]:
af.columns = ['A', 'B', 'C', 'D']
af
Out[73]:
A B C D
0 0.700466 0.860813 0.194320 0.156960
1 0.670055 0.874178 0.728934 0.001194
2 0.205696 0.342016 0.723002 0.015029

Pandas --> NumPy¶

In [74]:
af
Out[74]:
A B C D
0 0.700466 0.860813 0.194320 0.156960
1 0.670055 0.874178 0.728934 0.001194
2 0.205696 0.342016 0.723002 0.015029
In [75]:
af.to_numpy()
Out[75]:
array([[0.70046587, 0.86081322, 0.19431979, 0.15695987],
       [0.67005485, 0.87417845, 0.72893385, 0.00119436],
       [0.2056958 , 0.34201637, 0.72300174, 0.0150288 ]])

Exercise¶

Get the BDNF and ITSN1 columns for the original dataframe df as a $N \times 2$ NumPy array.

In [ ]:
 

Exercise Key¶

Get the BDNF and ITSN1 columns for the original dataframe df as a $N \times 2$ NumPy array.

In [76]:
df[['BDNF', 'ITSN1']].to_numpy()
Out[76]:
array([[0.4301753 , 0.74719322],
       [0.41177034, 0.68906355],
       [0.41830878, 0.7302468 ],
       ...,
       [0.23411809, 0.39517937],
       [0.24397413, 0.41289438],
       [0.25656431, 0.46105919]])

Plotting with Seaborn¶

conda activate neu365
pip install seaborn
In [77]:
import seaborn as sns
In [78]:
sns.relplot(x="BDNF", y="pCAMKII", data=df);
No description has been provided for this image
In [79]:
sns.catplot(x="Genotype", y="pCAMKII", data=df);
No description has been provided for this image
In [80]:
sns.catplot(x="Genotype", y="pCAMKII", kind="box", data=df);
No description has been provided for this image
In [81]:
sns.catplot(x="Genotype", y="pCAMKII", hue="Treatment", kind="box", data=df);
No description has been provided for this image
In [82]:
sns.catplot(x="Genotype", y="pCAMKII", hue="Treatment", col="Behavior", kind="box", data=df);
No description has been provided for this image
In [83]:
sns.catplot(x="Genotype", y="pCAMKII", hue="Treatment", col="Behavior", kind="swarm", data=df);
No description has been provided for this image
In [84]:
sns.catplot(x="Genotype", y="pCAMKII", row="Treatment", col="Behavior", kind="swarm", data=df);
No description has been provided for this image

Exercise¶

Plot box plots comparing BDDNF expression levels for each behavior pardigm under each treatment protocol.

In [ ]:
 

Exercise Key¶

Plot box plots comparing BDDNF expression levels for each behavior pardigm under each treatment protocol.

In [85]:
sns.catplot(x="Behavior", y="BDNF", hue="Treatment", kind="box", data=df);
No description has been provided for this image
In [86]:
sns.pairplot(df.loc[:,'NR2A':'pCAMKII']);
No description has been provided for this image

Correlations amongst columns¶

No description has been provided for this image
In [87]:
df.corr(numeric_only=True)
Out[87]:
DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB pELK ... SHH BAD BCL2 pS6 pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA
DYRK1A 1.000000 0.959512 0.359433 0.294526 0.325063 -0.180959 -0.093737 -0.180169 0.047299 0.791173 ... -0.199363 -0.253175 -0.167358 -0.326577 -0.228488 -0.047246 -0.384816 -0.323397 -0.473734 0.476033
ITSN1 0.959512 1.000000 0.459884 0.422492 0.426182 -0.147775 -0.076507 -0.132870 0.171122 0.780926 ... -0.200833 -0.327136 -0.257745 -0.256573 -0.309593 0.042422 -0.347339 -0.399856 -0.511433 0.517782
BDNF 0.359433 0.459884 1.000000 0.805784 0.749993 0.317537 0.390530 0.246760 0.603857 0.451598 ... -0.080776 -0.186508 -0.146526 0.127703 -0.254218 0.450433 -0.135637 -0.239907 -0.244732 0.262578
NR1 0.294526 0.422492 0.805784 1.000000 0.873889 0.211540 0.244224 0.301151 0.597443 0.416629 ... -0.141423 -0.386577 -0.321790 0.264181 -0.374932 0.473366 -0.234171 -0.319672 -0.351958 0.205463
NR2A 0.325063 0.426182 0.749993 0.873889 1.000000 0.110172 0.111143 0.280733 0.392695 0.409502 ... -0.259166 -0.449268 -0.297453 0.262492 -0.434530 0.400526 -0.314210 -0.346987 -0.454366 0.187292
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SYP -0.047246 0.042422 0.450433 0.473366 0.400526 0.248531 0.284492 0.244965 0.413060 0.039426 ... 0.056826 0.052303 0.039724 0.423405 -0.039379 1.000000 0.031898 0.098065 -0.010347 0.246107
H3AcK18 -0.384816 -0.347339 -0.135637 -0.234171 -0.314210 0.338456 0.301049 0.193369 0.235879 -0.259559 ... 0.241714 0.465087 0.379046 0.138683 0.286105 0.031898 1.000000 0.332770 0.769093 -0.239835
EGR1 -0.323397 -0.399856 -0.239907 -0.319672 -0.346987 0.280476 0.349281 0.079262 -0.117623 -0.202839 ... 0.378905 0.805637 0.841650 0.239695 0.725289 0.098065 0.332770 1.000000 0.731871 -0.470487
H3MeK4 -0.473734 -0.511433 -0.244732 -0.351958 -0.454366 0.383404 0.402397 0.065615 0.056099 -0.277555 ... 0.324445 0.749225 0.712321 0.222595 0.558868 -0.010347 0.769093 0.731871 1.000000 -0.404005
CaNA 0.476033 0.517782 0.262578 0.205463 0.187292 -0.238567 -0.213691 -0.375797 0.039517 0.233863 ... -0.123215 -0.276626 -0.299910 -0.344369 -0.302378 0.246107 -0.239835 -0.470487 -0.404005 1.000000

77 rows × 77 columns

In [88]:
corr = df.corr(numeric_only=True)

plt.figure(figsize=[10,8])
sns.heatmap(corr);
No description has been provided for this image

Note that the corr() matrix does NOT contain any categorical variables!

heatmapz¶

https://pypi.org/project/heatmapz/

A better way to visualize correlations

In [89]:
# see heatmap.py module file
from heatmap import heatmap, corrplot
In [90]:
plt.figure(figsize=[12,12])
ax = corrplot(corr, size_scale=50)
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=90, ha='center');
No description has been provided for this image

Correlations between both numerical AND categorical data!¶

Dython

conda activate neu365
pip install scipy==1.11.4, dython
In [91]:
import dython
In [92]:
result = dython.nominal.associations(df.iloc[:,1:], compute_only=True)  # ignore MouseID column
result['corr']
/opt/miniconda3/envs/neu365/lib/python3.12/site-packages/dython/nominal.py:708: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[i], columns[j]] = _inf_nan_str(ij)
/opt/miniconda3/envs/neu365/lib/python3.12/site-packages/dython/nominal.py:709: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  inf_nan.loc[columns[j], columns[i]] = _inf_nan_str(ji)
Out[92]:
DYRK1A ITSN1 BDNF NR1 NR2A pAKT pBRAF pCAMKII pCREB pELK ... pCFOS SYP H3AcK18 EGR1 H3MeK4 CaNA Genotype Treatment Behavior class
DYRK1A 1.000000 0.959255 0.367778 0.306508 0.335411 -0.147453 -0.058022 -0.164829 0.073716 0.792209 ... -0.177847 -0.045088 -0.227613 -0.043643 -0.225390 0.472178 0.084061 0.006329 0.507845 0.538328
ITSN1 0.959255 1.000000 0.473075 0.437863 0.440253 -0.104235 -0.028655 -0.112148 0.202193 0.785017 ... -0.202560 0.044856 -0.191353 -0.025779 -0.225843 0.510772 0.164719 0.024292 0.485934 0.535317
BDNF 0.367778 0.473075 1.000000 0.826391 0.762199 0.379601 0.456188 0.277158 0.645409 0.473381 ... -0.140422 0.433283 -0.036878 -0.060561 -0.186493 0.241898 0.059476 0.045885 0.154089 0.310029
NR1 0.306508 0.437863 0.826391 1.000000 0.876241 0.284832 0.327402 0.328389 0.640559 0.440822 ... -0.189127 0.454020 -0.086095 -0.028069 -0.196479 0.187222 0.123151 0.045167 0.034652 0.298001
NR2A 0.335411 0.440253 0.762199 0.876241 1.000000 0.163371 0.173304 0.301790 0.431922 0.428873 ... -0.245247 0.395922 -0.178156 -0.071078 -0.275716 0.178642 0.167971 0.017966 0.120696 0.358313
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CaNA 0.472178 0.510772 0.241898 0.187222 0.178642 -0.234543 -0.208196 -0.374794 0.030708 0.227569 ... -0.254194 0.246107 -0.068643 -0.045902 -0.189644 1.000000 0.045850 0.125451 0.748772 0.781937
Genotype 0.084061 0.164719 0.059476 0.123151 0.167971 0.071788 0.006394 0.017890 0.144414 0.041299 ... 0.103784 0.227713 0.170673 0.028657 0.245490 0.045850 1.000000 0.000000 0.000000 0.997213
Treatment 0.006329 0.024292 0.045885 0.045167 0.017966 0.141788 0.195526 0.193406 0.058549 0.004628 ... 0.008524 0.010898 0.141047 0.061332 0.194178 0.125451 0.000000 1.000000 0.000000 0.997213
Behavior 0.507845 0.485934 0.154089 0.034652 0.120696 0.322728 0.281845 0.475608 0.208261 0.290978 ... 0.261077 0.072460 0.162500 0.176697 0.187514 0.748772 0.000000 0.000000 1.000000 0.997213
class 0.538328 0.535317 0.310029 0.298001 0.358313 0.409956 0.372766 0.560796 0.292861 0.331422 ... 0.357843 0.280641 0.351382 0.205604 0.373513 0.781937 0.997213 0.997213 0.997213 1.000000

81 rows × 81 columns

In [93]:
plt.figure(figsize=[12,12])
ax = corrplot(result['corr'], size_scale=50)
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=90, ha='center');
No description has been provided for this image

Exercise¶

Which protein is most correlated with class?

Plot categorical jitter plots comparing the expression level of this protein for each class.

In [ ]:
 

Exercise Key¶

Which protein is most correlated with class?

Plot categorical jitter plots comparing the expression level of this protein for each class.

In [94]:
sns.catplot(x="class", y="SOD1", data=df);
No description has been provided for this image

From the above exercise, it is clear that SOD1 primarily differs based on the Behavior conditioning paradigm.

In [95]:
sns.catplot(x="Behavior", y="SOD1", hue="Genotype", data=df);
No description has been provided for this image

Interactive plots with hvplot¶

conda activate neu365
pip install hvplot
In [96]:
import hvplot.pandas
In [97]:
df['pCFOS'].hvplot()
Out[97]:
In [98]:
df[['pCREB', 'pCFOS']].hvplot()
Out[98]: